阿尔茨海默氏病(AD)的早期诊断对于促进预防性护理和延迟进展至关重要。基于语音的自动广告筛选系统为其他临床筛查技术提供了一种非侵入性,更可扩展的替代方案。此类专业数据的稀缺性会导致模型选择和特征学习的不确定性。为此,本文调查了功能和模型组合方法的使用,以改善Bert和Roberta预先训练的文本编码有限数据的域微调的鲁棒性,然后在将结果的嵌入功能馈入后端分类器集合之前通过多数投票制定最终的广告检测决定。在ADRESS20挑战数据集上进行的实验表明,使用模型和功能组合在系统开发中获得了一致的性能改进。使用手册和ASR语音转录本在ADRESS20测试集上分别获得了91.67%和93.75%的最先进的AD检测精度,该准确的准确性是由48位老年人组成的。
translated by 谷歌翻译
阿尔茨海默氏病(AD)的早期诊断对于促进预防性护理以延迟进一步发展至关重要。本文介绍了建立在痴呆症Pitt copus上的基于最新的构象识别系统以自动检测的开发。通过纳入一组有目的设计的建模功能,包括基于域搜索的自动配置特异性构象异构体超参数除外,还包括基于速度扰动和基于规格的数据增强训练的基线构象体系统可显着改善。使用学习隐藏单位贡献(LHUC)的细粒度老年人的适应性;以及与混合TDNN系统的基于两次通行的跨系统逆转。在48位老年人的评估数据上获得了总体单词错误率(相对34.8%)的总体单词错误率(相对34.8%)。使用最终系统的识别输出来提取文本特征,获得了最佳的基于语音识别的AD检测精度为91.7%。
translated by 谷歌翻译
混合动力和端到端(E2E)自动语音识别(ASR)系统之间的基本建模差异在其中创造了巨大的多样性和互补性。本文研究了混合TDNN和构型E2E ASR系统的基于多通的逆转和交叉适应系统组合方法。在多通恢复中,最先进的混合动力LF-MMI训练有素的CNN-TDNN系统具有速度扰动,规格和贝叶斯学习隐藏单元供款(LHUC)扬声器的适应器,以在被恢复之前产生初始的N-tesk输出由扬声器适应构象异构体系统,使用2向跨系统得分插值。在交叉适应中,混合CNN-TDNN系统适用于构象异构体系统的1好的输出,反之亦然。在300小时的总机语料库上进行的实验表明,使用两种系统组合方法中的任何一个得出的组合系统都超过了单个系统。在NIST HUB5'00,RT03和RT03和RT02评估数据。
translated by 谷歌翻译
关节特征本质上是声信号失真的不变,并且已成功地纳入了为正常语音设计的自动语音识别(ASR)系统。它们在非典型任务领域(例如老年人和跨语言的言语无序)的实际应用通常受到从目标扬声器收集此类专家数据的困难。本文介绍了一种跨域和跨语性A2A反演方法,该方法利用了A2A模型中24小时TAL Corpus的平行音频,视觉和超声舌成像(UTI)数据,然后进行交叉训练和交叉训练。语言适用于两种语言的三个数据集:英语dementiabank pitt和antonese JCCOCC MOCA老年演讲Corpora;以及英语Torgo违反语音数据,以产生基于UTI的发音特征。 Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline hybrid TDNN and Conformer based end-to-end systems constructed using acoustic features only by statistically significant word error rate or character error rate reductions up to 2.64%, 1.92% and数据增强和说话者适应后,绝对4.17%,7.89%和13.28%相对1.21%。
translated by 谷歌翻译
尽管针对正常语音的自动语音识别(ASR)技术取得了迅速的进展,但迄今为止,准确认识违反障碍和老年语音仍然是高度挑战的任务。由于这些用户中经常发现的移动性问题,很难为ASR系统开发收集大量此类数据。为此,数据增强技术起着至关重要的作用。与现有的数据增强技术相反,仅修改光谱轮廓的说话速率或整体形状,使用一组新颖的扬声器依赖(SD)生成对抗网络(Gan )本文基于数据增强方法。这些既可以灵活地允许:a)在可用的语音数据可用时修改时间或速度的正常语音光谱,并更接近受损说话者的扬声器; b)对于非平行数据,SVD分解了正常语音频谱基础特征,要转换为目标老年人说话者的特征,然后再与时间基础重组以生成最先进的TDNN的增强数据和构象体ASR系统培训。实验是针对四个任务进行的:英语Uapseech和Torgo违反语音语音Corpora;英国痴呆症皮特和广东话JCCOCC MOCA老年语音数据集。所提出的基于GAN的数据增强方法始终优于基线速度扰动方法,最多可在Torgo和Dementiabank数据上降低4.91%和3.0%的绝对速度(相对相对9.61%和6.4%)。应用基于LHUC的扬声器适应后,保留了一致的性能改进。
translated by 谷歌翻译
传统的推荐系统面临两个长期存在的障碍,即数据稀疏性和冷启动问题,这些问题促进了跨域建议(CDR)的出现和发展。 CDR的核心思想是利用从其他领域收集的信息来减轻一个域中的两个问题。在过去的十年中,许多努力进行了跨域建议。最近,随着深度学习和神经网络的发展,出现了许多方法。但是,关于CDR的系统调查数量有限,尤其是关于最新提出的方法以及他们解决的建议方案和建议任务。在本调查文件中,我们首先提出了跨域建议的两级分类法,该分类法对不同的建议方案和建议任务进行了分类。然后,我们以结构化的方式介绍并总结了不同建议方案下的现有跨域推荐方法。我们还组织了常用的数据集。我们通过提供有关该领域的几个潜在研究方向来结束这项调查。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译